Polylingual Tree-Based Topic Models for Translation Domain Adaptation

نویسندگان

  • Yuening Hu
  • Ke Zhai
  • Vladimir Eidelman
  • Jordan L. Boyd-Graber
چکیده

Topic models, an unsupervised technique for inferring translation domains improve machine translation quality. However, previous work uses only the source language and completely ignores the target language, which can disambiguate domains. We propose new polylingual tree-based topic models to extract domain knowledge that considers both source and target languages and derive three different inference schemes. We evaluate our model on a Chinese to English translation task and obtain up to 1.2 BLEU improvement over strong baselines.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topic Models for Translation Domain Adaptation

Topic models have been successfully applied in domain adaptation for translation models. However, previous works applied topic models only on source side and ignored the relations between source and target languages in machine translation. This paper corrects this omission by learning models that can also use targetside information to discover more distinct topics: tree-based topic models and p...

متن کامل

Expressive Knowledge Resources in Probabilistic Models

Title of dissertation: EXPRESSIVE KNOWLEDGE RESOURCES IN PROBABILISTIC MODELS Yuening Hu, Doctor of Philosophy, 2014 Dissertation directed by: Professor Jordan Boyd-Graber iSchool, Umiacs Understanding large collections of unstructured documents remains a persistent problem. Users need to understand the themes of a corpus and to explore documents of interest. Topic models are a useful and ubiqu...

متن کامل

Polylingual Topic Models

Topic models are a useful tool for analyzing large text collections, but have previously been applied in only monolingual, or at most bilingual, contexts. Meanwhile, massive collections of interlinked documents in dozens of languages, such as Wikipedia, are now widely available, calling for tools that can characterize content in many languages. We introduce a polylingual topic model that discov...

متن کامل

Rapid Unsupervised Topic Adaptation – a Latent Semantic Approach

In open-domain language exploitation applications, a wide variety of topics with swift topic shifts has to be captured. Consequently, it is crucial to rapidly adapt all language components of a spoken language system. This thesis addresses unsupervised topic adaptation in both monolingual and crosslingual settings. For automatic speech recognition we rapidly adapt a language model on a source l...

متن کامل

Learning Polylingual Topic Models from Code-Switched Social Media Documents

Code-switched documents are common in social media, providing evidence for polylingual topic models to infer aligned topics across languages. We present Code-Switched LDA (csLDA), which infers language specific topic distributions based on code-switched documents to facilitate multi-lingual corpus analysis. We experiment on two code-switching corpora (English-Spanish Twitter data and English-Ch...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014